Augusta
Unsupervised decoding of encoded reasoning using language model interpretability
As large language models become increasingly capable, there is growing concern that they may develop reasoning processes that are encoded or hidden from human oversight. To investigate whether current interpretability techniques can penetrate such encoded reasoning, we construct a controlled testbed by fine-tuning a reasoning model (DeepSeek-R1-Distill-Llama-70B) to perform chain-of-thought reasoning in ROT-13 encryption while maintaining intelligible English outputs. We evaluate mechanistic interpretability methods--in particular, logit lens analysis--on their ability to decode the model's hidden reasoning process using only internal activations. We show that logit lens can effectively translate encoded reasoning, with accuracy peaking in intermediate-to-late layers. Finally, we develop a fully unsupervised decoding pipeline that combines logit lens with automated paraphrasing, achieving substantial accuracy in reconstructing complete reasoning transcripts from internal model representations. These findings suggest that current mechanistic interpretability techniques may be more robust to simple forms of encoded reasoning than previously understood. Our work provides an initial framework for evaluating interpretability methods against models that reason in non-human-readable formats, contributing to the broader challenge of maintaining oversight over increasingly capable AI systems.
- North America > United States > Illinois > Sangamon County > Springfield (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.07)
- North America > United States > California > Sacramento County > Sacramento (0.05)
- (22 more...)
Rise Of The Robot Bees: Tiny Drones Turned Into Artificial Pollinators
An artist's illustration shows how a remote-controlled drone might one day be used to pollinate flowers. Courtesy of Dr. Eijiro Miyako hide caption An artist's illustration shows how a remote-controlled drone might one day be used to pollinate flowers. Near Esparto, in the beautiful Capay Valley region of central California, 1,400 young almond trees flourish in a century-old orchard overlooking the hills. Since November, they've stood in perfect rows without a hint of foliage -- resting, naked and dormant, for the upcoming growing season. Their branches now swell with bright pastel blooms in preparation for pollination. Like most almond growers, Brian Paddock, owner of Capay Hills Orchard, relies on bees to provide this important aspect of crop development.
- North America > United States > Minnesota (0.05)
- North America > United States > Maine > Kennebec County > Augusta (0.05)
- North America > United States > California > Riverside County > Riverside (0.05)
- Asia > Japan (0.05)
Teaching Machines to Learn on Their Own
Steve Mirsky: Welcome to Scientific American's, Science Talk, posted on November 10, 2015. A short episode today for which I'll turn it over now to Scientific American's associate tech editor, Larry Greenemeier. Larry Greenemeier: Computers have always been good at doing things that are really complicated for us humans. On the other hand, computers have a really hard time recognizing a particular voice or face in a crowd; something most kids learn to do before they're even out of diapers. But things are changing fast. Over the next decade or so, machines will more easily mimic inherently human abilities.
- North America > United States > New York (0.05)
- North America > United States > Maine > Kennebec County > Augusta (0.05)
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- (2 more...)
- Information Technology > Communications > Mobile (0.50)
- Information Technology > Artificial Intelligence > Machine Learning (0.37)